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[57] ABSTRACT 

Methods and systems are provided for synchronizing local 
copies of a distributed database, such as a master copy and 
a partial copy stored in a replica or in a cache. Each data item 
in the database has an associated timestamp or other tag. An 
index into the tags in maintained. The tag index may be used 
to create an event list to reduce the time and bandwidth 
needed to synchronize the local copies. The tag index may 
also be used to create a virtual update log, thereby removing 
the need to maintain one or more physical logs recording the 
history of the copies. 

32 Claims, 4 Drawing Sheets 
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DISTRIBUTED DATABASE USING INDEXED even when the cached and master copies of the objects in 

INTO TAGS TO TRACKS EVENTS question are already synchronized. This wastes bandwidth, 

ACCORDING TO TYPE, UPDATE CACHE, memory, and processing cycles, particularly as the number 

CREATE VIRTUAL UPDATE LOG ON of cached objects grows. Another drawback is that flexible 
DEMAND 5 caching policies are hard to implement because all updates 

are treated the same way by all caches. 

Using a physical update log to track operations on the 

The present invention relates to distributed database com- master repUca also has disadvantages. Logs can be quite 

puter systems, and more particularly to distributed database large if they are not compressed, since each log must contain 

systems which use indexed tags to track events according to at least one entry (recording object creation) for each object 

type, to update a cache of database data items, to construct in the replica. Even if log compression is used, physical 

an update log on demand, and to provide other capabilities. update logs may require substantial disk storage space on the 

node. A synchronization checkpoint must also be maintained 

TECHNICAL BACKGROUND OF THE fn the log for each other replica that can synchronize with the 
INVENTION 15 master replica. These checkpoints prevent a later update 

The potential advantages of distributed database systems ^^^^S. merged into an earlier one when the checkpoint 

are well-known. As computing power becomes more widely between the two updates. They aUo reduce scalability 

available at lower prices, the most cost-effective approach to ^ number of caching replicas. 

database implementation often involves harnessing many It would therefore be an advancement in the art to provide 

connected processors together into one large system. Some an improved method and system for distributed database 

database uses, such as email or document management caching to reduce the amount of unnecessary data sent 

capabilities, are inherently driven toward distributed imple- between nodes. 

mentations. Distributing a database may also improve n would be an additional advancement to provide such a 

reliability, since the failure of a single processor in a method and system which support caching policies that treat 

distributed system will not necessarily bring to a halt all use specified updates differently from other updates, 

of the. database. M a result, databases are often distributed ^^^^^ advancement to provide such a method 

among connectable nodes m a network, with each node ^^^^^^ ^^^^ ^^^^^^ ^^^^^^ j ^^^^^^ ^^^^^ ^^^^ 

reccivmg a rephca of part or all of the database. continuously 

However, distributing database replicas creates the prob- ^^^^ ^ ^^^^^^ ^ distributed databases are 

lem of maintainmg o^nsistency at east to some degree, ^^^^^^^ ^^^^^^ ^^^^-^ 
between the rephcas. Steps must be taken to synchronize the 

replicas so that a database query using one replica of the gRjgp SUMMARY OF THE INVENTION 
database tends (or in some cases, is guaranteed) to give the 

same result as a query using another replica of the database. The present invention provides a method and system for 

Aspects of database transaction synchronization are dis- using indexed tag^ in a distributed database. "Tags" include 

cussed in commonly owned copending application Ser. No. timestamps, version numbers, sequence numbers, update 

08/700,487 filed Sep. 3, 1996. Aspects of clash handhng reference numbers, transaction counters, and other means of 

during synchronization are discussed in commonly owned determining the relative order of two operations on a data- 

copending application Ser. No. 08/700,489 filed Sep. 3, t'ase replica. "Indexes" include hash tables, balanced trees, 

1996. Commonly owned copending application Ser. No. and other structures that allow relatively rapid and direct 

08/700,490 filed Sep. 3, 1996 discusses compression of access to a specific item in a large collection of items, based 

"physical" update logs, namely, logs which are created and on one or more key values. 

maintained more-or-less continuously during database Both conventional systems and systems according to the 

u.sage. These discussions are incorporated herein by refer- invention may use indexes into database objects and records, 

ence. In conventional systems, the indexed items contain data 

Caching part or all of a replica in memory, to reduce disk supplied by and/or sought by database users. This may also 

accesses and/or network traffic, may dramatically reduce the be true in systems according to the invention, but the 

response time to a query. However, caching complicates invention also provides indexes into tags. To the best knowl- 
synchronization by increasing both the number and kind of 50 edge of the inventors, conventional systems do not index 

rephcas present in the system. Both cached replicas and tags or use indexed tags to provide the capabilities disclosed 

replicas stored on disk must be updated to maintain adequate here. 

consistency throughout the database. In addition, decisions One embodiment of a computer system according to the 

must be made about when to use the cache and when to use invention includes a distributed database of objects or other 
the disk in response to a database query or update operation. 55 data items. For instance, the database may contain records, 

One synchronization method sends a list of cached data- since objects and records are interchangeable with regard to 

base object identifiers and corresponding timestamps or most aspects of the invention. The database is divided into 

sequence numbers from the caching node to a master node local copies, such as replicas or partitions. A tag is associated 

which holds a master replica. The master node compares this with each data item in a local copy. An update means is 
list with the list of objects in the master replica, compares the 60 provided for updating the tag for a given data item whenever 

timestamps of objects found in both replicas, and then uses a database operation is performed on the given item, such 

a physical update log to generate a list of update operations. that the tag is no smaller than the largest tag currently in the 

The list of update operations is sent back to the caching node local copy. An index into the tags for the local copy is also 

and applied to the cached database objects, thereby synchro- maintained. 

nizing the cached replica with the master replica. 55 Database update operations (also known as "events") are 

A major drawback of this synchronization method is that categorized by type, and the index into the tags can be used 

some object identifiers and timestamps may be transmitted to quickly determine what events occuned after the event 
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that corresponds to a given tag value. The index can be used The network may include one or more LANs, wide-area 

to efficiently create a list of recent events, which can theri be networks, Internet servers and clients, intranet servers and 

sent to a master node to obtain the information needed to clieots, pecr-to-peer nodes, or a combination thereof, 

update a local cache of database objects or records. Basing One of the computer systems suited for use with the 

the update on the list of recent events reduces the amount of 5 present invention is indicated generally at 100 in FIG. 1. In 

unnecessary data sent between nodes because information one embodiment, the system 100 includes Novell NetWare® 

about items that are still synchronized need not be trans- network operating system software (NETWARE is a regis- 

mitted. The cache may also be updated periodically by a ^^^^ trademark of Novell, Inc.) and Novell Directory Ser- 

cache manager, regardless of any requests to access the ^^^^s software. In alternative embodiments, the system 100 

^.^j^Ijc jq includes NetWare Connect Services, VINES, TCP/IP, Win- 

^ , . . *■ u * 11 ,u . * dows NT, Windows 95, LAN Manager, and/or LANtastic 

Categonzmg operations by type allows the system to use , , * . . , 

f r u- 1- • L- L ^ * _* • network operatmg system software and/or an implementa- 

a variety of cachmg policies which treat certam operations c . , f. , j j . 

, '1, r JZ < • - * I- lion 01 a distnbuted hierarchical partitioned object database 

dmerently from other operations. For instance, one cache , ^ . . ^^ 

j,^ J , , L I.*' J J ^ according to the X.500 protocol or another directory service 

may add a data item to the cache each time an add event ■ , . t • • t>.- . a n . i 

^ ... ^, uiu *u u i. K protocol such as the Lightweight Directory Access Protocol 

occurs, while another cache only changes the cache when a 15 . . . i po c * ktt^ «;TNTT^r^«7o 

JO (VINESisatrademarkof Banyan Systems; NT, WINDOWS 

moaity event occurs. MANAGER are trademarks of Microsoft 

The index mto the tags can also be used to create a Corporation; LANTASTIC is a trademark of Artisoft). The 

"virtuar- update log on demand, reducing or eliminating the ^^^^^^ may include a local area network 102 which is 

need for continually mamtainmg a physical data log. This connectable to other networks 104, including other LANs or 

dynamically created log can be used to synchronize two portions of the Internet or an intranet, through a gateway or 

copies of the database, and to handle clashes during similar mechanism 

synchronization in niuch the same way as a physical -^^^^^^^ ^^^^^^1 ^^^^^ ^^^^ 

without aU the drawbacks of physical logs, connected by network signal lines 108 to one or more 

Each data item's tag may also be tailored to identify network clients 110. The servers 106 and network clients 

which copy of the database the data item is located in, so that no may be configured by those of skill in the art in a wide 

updates to a given local copy can be grouped during syn- variety - of ways to operate according to the present inven- 

chronization. This promotes locality of reference, which tion. The servers 106 may be configured as Internet servers, 

leads to better caching performance. Other features and ag intranet servers, as directory service providers or name 

advantages of the present invention will become more fully servers, as software component servers, or as a combination 

apparent through the following description. ' thereof. The servers 106 may be uniprocessor or multipro- 
cessor machines. The servers 106 and clients 110 each 
include an addressable storage medium such as random 

To illustrate the manner in which the advantages and access memory and/or a aon-volattle storage medium such 

features of the invention are obtained, a more particular 35 as a magnetic or optical disk, 

description of the invention will be given with reference to Suitable network clients 110 include, without limitation, 

the attached drawings. These drawings only illustrate personal computers 112, laptops 114, workstations 116, and 

selected aspects of the invention and thus do not limit the dumb terminals. Tlie signal lines 108 may include twisted 

invention's scope. In the drawings: pair, coaxial, or optical fiber cables, telephone lines, 

FIG. 1 is a diagram illustrating a network of computers 40 satellites, microwave relays, modulated AC power lines, and 

which is among the many systems suitable for use with the other data transmission "wires" known to those of skill in the 

present invention, art. In addition to the network client computers 110, a printer 

no. 2 is a diagram illustrating the indexing of data items ^rray of disks 120 are also attached to the system 

in a conventional system. 1^®' ^g^^^" computer may function both as a client UO and 

... .1 - J • f 45 as a server 106; this may occur, for instance, on computers 

na 3 IS a diagram illuslraUng generaUy the indexing of ^^^j^ Microsoft Windows NT software. Although particu- 

to res? ^^'^J^^'^'^^'''^^ °^ '° ' 'y"'"" ^^''"'^'"^ '° lar individual and network computer systems and ^mpo- 

*^ nents are shown, those of skill in the art will appreciate that 

HG. 4 is a diagram iUustrating in greater detail the present invention also works with a variety of other 

indexing of data items and tags m a system according to the networks and computers. 

present invention ^^^^^ ^j^^ network cUents UO are capable 

HG. 5 is a flowchart illustrating methods of the present Qf using floppy drives, tape drives, optical drives or other 

invention. ^ means to read a storage medium 122. A suitable storage 

FIG. 6 is a diagram further illustrating a portion of an medium 122 includes a magnetic, optical, or other 

inventive system such as the system shown in FIG. 1. 55 computer-readable storage device having a specific physical 

substrate configTiration, Suitable storage devices include 
floppy disks, hard disks, tape, CD-ROMs, PROMs, random 
access memory, and other computer system storage devices. 
The present invention relates to a method and system for The substrate configuration represents data and instructions 
using indexed tags in a distributed database. The invention 60 which cause the computer system to operate in a specific and 
may be used with local area networks, wide area networks, predefined manner as described herein. Thus, the medium 
and/or the Internet. "Internet" as used herein includes varia- 122 tangibly embodies a program, functions, and/or instruc- 
tions such as a private Internet, a secure Internet, a value- lions that are executable by the servers 106 and/or network 
added network, a virtual private network, an extranet, or an client computers 110 to perform database management with 
intranet. The computers connected by the network may be 65 indexed tags substantiaUy as described herein. Suitable 
workstations, laptop computers, disconnectable mobile software for implementing the invention is readily provided 
computers, servers, mainframes, or a combination thereof. by those of skill in the art using the teachings presented here 
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and programming languages such as Java, Pascal, C++, C, 
assembly, firmware, microcode, and/or other languages. 

FIG. 2 illustrates a conventional database system which 
includes a data index 200 into a set of data items 202. The 
data items 202 may include objects, records, or other col- s 
lections of data values. The data items 202 are principally 
supplied by and/or sought by database users, as opposed to 
being principally used internally for administration, 
organization, and/or manipulation of the database. The data 
items 202 may be organized in a hierarchical database, a 
relational database, or another organization. 

The data index 200 includes one or more hash tables, 
balanced trees, and/or other structures that allow relatively 
rapid and direct access to a specific data item 202 or set of 
data items 202 based on one or more key data values. In an 
inventory or sales database, for instance, keys such as part 
number or item name could be used to index data items 202 
containing information such as cost, price, and current 
availability. 

Each data item 202 has an associated tag 204 (in alter- 
native embodiments, only selected data items are tagged). 
Each tag 204 value corresponds to an event in the history of 
the associated data item 202, such as the most recent update 
to the data item 202. The lags 204 may be designed to allow 
recovery of earlier versions of the data item 202. The tags 
204 may also allow the system to determine which copy of 
a data item 202 is more recent, so that the most recent data 
can be propagated during synchronization. 

Tigs 204 are typically restricted to internal ase. They may 
be accessed by administrators but are seldom or never seen 
by most database user Suitable tags 204 include timestamps, 
version numbers, sequence numbers, update reference 
numbers, transaction counters, and other means of deter- 
mining the relative order of operations on data items 202. A 
transaction counter guarantees ordering of events and sup- 
ports synchronization because the counter is received by a 
cache or other database copy from a master database copy. 

FIGS. 3 and 4 show systems according to the present 
invention. In the conventional system shown in FIG. 2, the 
data index 200 is keyed on data item 202 values, not on tag 
204 values. By contrast, the tags 204, 400 in the inventive 
systems are indexed for rapid access. Systems according to 
the invention may use data indexes 200 into database items 
202 just as in conventional systems. However, in such cases 
the invention expands indexing to include indexes 300, 402 45 
into the tags 204, 400. 

Suitable tag indexes 300, 402 include hash tables, bal- 
anced trees, and/or other structures that allow relatively 
rapid and direct access to a specific tag 204, 400 based on 
one or more key tag values. For tags such as the tags 400 50 
which include a timestamp, a key tag value may be a 
timestamp value, such as the timestamp corresponding to a 
synchronization of master and slave replicas. For tags such 
as the tags 400 which include an event identifier, a key tag 
value may identify an event or a set of events, such as the 55 
creation or modification of the corresponding data item 202. 

The tags 400, which contain an event identifier and a 
timestamp, are jiist one of many possible tags 204. In 
alternative embodiments, a tag 204 contains a version 
number, sequence number, update reference number or other 60 
value that allows the system to determine the relative order 
of operations on data items 202. The optional event identifier 
in the tag 204 may identify events such as the addition, 
modification, deletion, replication, security or permissions 
modification, and/or reading of a data item 202. 65 

For convenience, the tags 204, 400 are shown adjacent 
corresponding data items 202. This does not require physical 
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adjacency. In some embodiments, the tags 204, 400 are 
stored separately from their corresponding data items 202. 
In addition, the correspondence between tags 204, 400 and 
data items 202 need not be a simple one-to-one correspon- 
dence. In one embodiment, each data item 202 (such as each 
data object) is associated with a set of tags 204, 400, not with 
just one tag 204, 400. 

Moreover, additional indexes 300 may be used. For 
instance, in one embodiment an index 300 provides rapid 
access to the tags 204, 400 by using data object 202 
identifiers to index a table of tags 204, 400. This allows rapid 
updating of the tags 204, 400 when the data objects 202 are 
changed, and in particular, allows rapid tag updates when the 
security characteristics of data objects are changed. 

FIG. 5 illustrates several methods of the present invention 
for managing a distributed database of objects, records, or 
other data items 202. During a tagging step 500, a tag 204 
is associated with at least some of the data items 202 in the 
database. The tag 204 may be created and associated with 
the data item 202 when the data item 202 is initially added 
to the database, or the tag 204 may be added later. 

The tag 204 includes a timestamp, version number or 
other means of identifying the relative position of the 
associated data item operation in the sequence of operations 
on data items 202; this tag component is placed in the tag 
204 during a step 502. The tag 204 optionally also contains 
an event identifier which distinguishes between different 
types of events on data items 202; the event identifier is 
placed in the tag 204 during a step 504. 

During an updating step 506, the tag for a given data item 
202 is updated to reflect the fact that a database operation 
has been (or soon will be) performed on the given data item 
202. The updating step 506 may include steps similar to the 
steps 502, 504, for updating the timestamp or version and for 
updating the event identifier, respectively. 

Those of skill in the art wiU understand that the updating 
step 506 need not be performed for every possible database 
operation, but rather only for those operations being tagged 
and managed according to the invention. Operations per- 
formed are not necessarily identified by the event identifier 
values. For instance, in one embodiment, read operations are 
performed but not tracked, while write, security modify, and 
delete operations performed are tracked and are each iden- 
tified by separate event identifier values. In this embodiment 
the updating step 506 is not done after a read operation, but 
tags 204 are updated to reflect write, security modify, and/or 
delete operations. 

In one embodiment, the tag 204 for a given data item 202 
is updated during the step 506 to reflect the operation on that 
data item 202 such that the tag 204 in question is no smaller 
than the largest tag currently in the local copy of the 
database. That is, the tag values form an increasing 
sequence, with the larger tag value(s) corresponding to the 
most recently performed operation(s). For clarity of 
discussion, reference is made principally to increasing 
sequences, but those of skiU in the art will recognize that a 
decreasing sequence could also be used, since the two 
approaches (increasing or decreasing) are readily inter- 
changeable. 

A tag-indexing step 508 maintains a tag index 300 into the 
tags 204 for at least the local copy of the database. Tags 204 
may be indexed according to timestamp or other tag values 
in the increasing sequence. If the tags 204 contain event 
identifiers, the tags 204 may also be indexed on the event 
identifiers, either according to individual event types (e.g., 
create, write data, write permissions, delete, rename) or 
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according to sets of event types (e.g., data-item-value- 
modifying events versus data-item-location-modifying 
events). 

The indexed tags are used during a step 510. For purposes 
of illustration, two uses of the indexed tags are shown in 
FIG. 5: updating a cache and synchronizing replicas. These 
uses are not mutually exclusive, and other uses of the 
indexed tags will also be apparent to those of skill in the art. 

With regard to cache updates, an event list is created 
during a step 512. Each element in the event list contains 
enough information to allow a cache manager to recreate the 
corresponding event, in order to duplicate the effect of the 
event and thus update the cache. Rather than sending all 
information on all operations between a computer holding a 
cache (such as the client 110 in FIG. 1) and a computer 
holding the master copy of cached data items 202 (such as 
the server 106), the present invention allows a cache man- 
ager to rapidly create a list containing only specified events. 
This reduces bandwidth usage and processing time, making 
it possible to update the cache more frequently. 

Events may be selected in various ways for membership 
in the event list. Three constraining steps 514, 516, 518 are 
shown in FIG. 5; other suitable constraints will also be 
known to those of skill in view of the teachings herein. 
Constraining steps may be applied individually or in com- 
bination. Explicit constraining steps may also be omitted. 
For instance, a constraint may be implicit in the associating 
step 500. Constraining steps may be implemented through 
filtering at either the tag index 300 level or the list level. It 
Ls preferred that filtering be done at the tag index 300 level 
when possible, so that links from the tag index 300 to the to 
the tags 204 are followed only when the tag 204 is likely to 
meet the constraint(s). 

A step 514 constrains event list membership using the 
timestamp, version number or similar tag component dis- 
cussed in connection with step 502. For instance, the list 
may contain only events that occurred after a specified 
event, or only events that occurred before a specified event, 
or only events that occurred between two specified events. 
In particular, the tag index 300 can be used to quickly 
determine what events occurred after the most recent cache 
update. 

If database operations are categorized by event type as 
discussed in connection with the step 504, a constraining 
step 516 may also be applied to limit list membership by 
event type. For instance, the step 516 may ensure that only 
events which modify data item 202 contents are placed in the 
event list. Constraining steps 514 and 516 can also be 
combined to produce a list of recent events constrained by 
event type; an event appears in the list only if the event is of 
one or more selected event types and is also sufficiently 
recent. 

To reduce memory, bandwidth, and per-update processing 
requirements, a step 518 may constrain the event list by 
limiting the maximum number of events placed in the list. 
This may be combined with other constraints to produce, for 
instance, a list of the ten most recent changes to data item 
202 permissions. 

During a step 520, the cache of data items 202 is updated 
using the event list. As illustrated by step 522, the cache may 
be updated in response to a request to access the cache. As 
illustrated by step 524, the cache may also be updated 
periodically by a cache manager, regardless of any requests 
to access the cache. The two approaches may also be 
combined, so the cache is updated at least as frequently as 
some predetermined period and is updated more frequently 
when access requests occur more frequently than the period. 
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To more fully illustrate the benefits of this method for 
updating caches, consider a system such as the system 600 
shown in FIG. 6. The system 600 includes a master system 
602 denoted "A" and a second master system 604 denoted 
"B'*. Each master system could be implemented using a 
server 106 and replicated database software such as NDS 
software. Group Wise software, or other software which uses 
a master copy of data items 202 to promote consistency 
among distributed copies of the data items 202. The data 
items 202 are stored in database replicas 606. 

The system 600 also includes two client caches 608, 610, 
which reside on clients 110. Each cache 608, 610 contains 
cached data 612, 614, respectively, which includes copies of 
at least some of the data items 202. The cached data 612, 614 
may be in the same format as the replicas 606, but this is not 
required for use of the invention. For instance, one database 
format could be used on the master systems 602, 604 while 
a different database format (even from a different vendor) 
could be used on the clients 110. Or each of the four 
locations 602, 604, 608, 610 could use a different database 
format. 

Each data item 202 in the master replicas 606 and each 
data item 202 in the cached data 612, 614 has a correspond- 
ing tag 204. Each cache 608, 610 also contains a token 616, 
618 for each of the master systems 602, 604 that holds the 
master replica for data items 202 in the cache 608, 610. 
These tokens 616, 618 are used to help synchronize the 
cached data 612, 614 with the master replicas 606, as 
explained below. 

To facilitate quick cache synchronization, the system 600 
uses indexed tags 204. Tags 204 stored on a given master 
system 602 or 604 record the most recent changes that have 
occurred for each data item 202 stored in that master system. 
For each tracked event type that can occur on data items 202 
in the master system, the logging facility on the master 
system stores a lag 204 value equal to the highest database 
sequence number at the time of the event for the data item 
202 being operated on. If the same event occurs on the same 
data item 202 at a later time, the previous tag 204 value is 
simply changed to the current database sequence number 
because only the last occunence of the event is needed for 
cache synchronization. 

The tags 204 are indexed to allow a quick lookup to 
determine what events occurred after a given database 
sequence number. Thus, a cache site 608 or 610 can send a 
request to the master system 602 or 604 to get a list of the 
most recent events that occurred on data items 202 since the 
last time the cache made an inquiry. The cache's request can 
specify the event types that should be returned. This allows, 
for example, one cache to add data items 202 to the cache 
each time an add event occurs while another cache may only 
update data items 202 in the cache when a modify event 
occurs. 

The request from the cache 608, 610 to the master system 
602, 604 can also specify the maximum number of events to 
return. Limiting the number of events is important because 
the cache may use a synchronization thread that allows its 
cache manager to process only a few events during each 
pass, thereby allowing CPU use by other processes. 

The list of events returned from the master system 602, . 
604 can be used to determine which operations should be 
performed on the cache 608, 610 to bring it into sync with 
the master system. The following table provides an example 
event list returned from the master system and possible 
operations to perform for each returned event: 
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Event List Returned from Master 



Event Object Action Performed by Cache 

EVENT_ADD Object 1 read Object 1 and add it to the cache 

EVENT_MODIFY Object 5 read Object 5, update it in the cache 

EVENT_PURGE Object 2 delete Object 2 fiDm the cache 

EVENT_CHANGE_SECURrrY Object 9 if Object 9 is in cache check if user 

still has rights to it. If not, remove the 
object from the cache. 



Each time the cache 608, 610 makes a request it is given 
a new token 616 or 618 representing the most recent event 
returned. The cache 608, 610 later uses this token 616, 618 
to begin navigating the sequence of events starting from the 
point where the last request left off. Because the master 
system 602, 604 may hold multiple databases 606, each 
token 616, 618 is encoded with sequence numbers from each 
database 606 in the master system. After the cache 608, 610 
successfully processes all events in the returned list, the 
appropriate token is stored in the cache database 612, 614, 
respectively, to be used for the next synchronization request. 

In one embodiment, the system 600 uses a current set of 
tokens 616, 618 which were stored in the caches 608, 610 
from the previous synchronization step 510. The caches 608, 
610 send their current tokens to the master system(s) 602, 
604 as part of the synchronization request(s). In response, 
each requested master system creates the event list and sends 
it back to the requesting cache. The cache manager then uses 
the event list to update the cached data. 

One embodiment of a system 600 includes Novell Group- 
Wise software. The system 600 allows a user (either a 
human or a software agent) to publish documents to the 
Internet from a Group Wise document management module. 
The system 600 automatically converts the documents from 
over many different file formats to HTML format. The 
HTML version is returned to the client 110 so it may be 
viewed from any HTML browser. The present invention is 
used to cache converted documents because the conversion 
stage can take more than a minute for complex documents. 
Each time a list of events is returned from the master system, 
the cache manager converts the appropriate document to 
HTML and stores it in the cache. The system supports 
multiple publishing sites (caches) such as an Internet and an 
intranet that service the same master systems. The master 
systems and their clients may use proprietary database 
formats, or they may use commercially available formats 
such as those found in Oracle databases and in the Microsoft 
Windows registry (trademarks of their respective owners). 
As noted, various databases can also be used at different 
sites. 

Use of the invention is beneficial because a user can add 
a document to the Group Wise library and the document will 
be automatically converted and placed in the cache. Because 
the document will likely be added to the cache before an 
Internet user requests the document, the perceived access 
time for documents in the library is instant. 

A conventional solution to this synchronization problem 
would send a list of objects in the cache to the master to 
determine which ones are out of date. The master would then 
compare this list with the objects in the master system and 
return a list of operations to perform for each object. By 
using the present invention, a smaller amount of data can be 
sent to the master to quickly determine what needs to be 
synchronized in' the cache. In one embodiment, the system 



600 allows synchronization intervals as frequent as every 
five seconds because the request requires minimal overhead. 

15 The conventional solution of sending a list of objects to the 
server requires much more time be spent compiling a list 
from the cache, requires much more data in the request 
because information about each object in the cache must be 
sent, and requires much more time be spent comparing 

20 objects sent from the cache with objects on the master to 
create a synchronization plan. 

Returning now to FIG. 5, the indexed tags are also used 
during step 510 to create an update log used to synchronize 
database copies. During a step 526, the tag index 300 is used 

25 to create a dynamic update log. The dynamic update log is 
then used during a step 528 to synchronize two copies of the 
database. The copies may include replicas such as database 
replicas 606 stored on non-volatile media, or cached data 
612, 614 stored in fast but volatile media such as RAM, or 
a combination thereof. 

The benefits of using dynamic logs according to the 
invention may be understood by comparing dynamic logs 
with two other approaches to maintaining the information 
that is needed to synchronize replicas or other copies of data 
in a distributed database. One alternative maintains a physi- 
cal log of updates instead of maintaining a dynamic log; the 
physical log grows on disk as updates are made to the 
replicas. A second alternative uses a state-based synchroni- 
zation method which does not explicitly construct a log and 
(more importantly) detects changes by remote comparison 

40 of the database copies instead of using local update refer- 
ences or other tags. 

The present invention differs from implementations 
which use physical update logs by not requiring that a 
physical list of update events be maintained. Instead, the 

45 tags 204 and tag indexes 300 provide sufficient information 
to construct a virtual log dynamically. The invention differs 
from implementations that use state-based synchronization 
by providing a log that makes synchronization possible 
without comparing all state information, that is, all data item 

50 202 values. 

The maintenance of a physical log poses several prob- 
lems. First, the physical log requires storage space on the 
maintaining location. The space needed increases with the 
number of replica locations that must be synchronized. 

55 Second, significant computational resources are needed to 
maintain the log with acceptable scalability. For instance, 
some kind of log compression may be needed. Third, 
per-replica structures must be maintained within the update 
log for each other replica that may synchronize with the 

60 replica in question. This is necessary in order to maintain 
"log checkpoints" when performing compression. Such 
checkpoints prevent the effects of a later update from being 
compressed into an earlier update if the current synchroni- 
zation point for a replica (the log location up to which that 

65 replica is synchronized) falls between checkpoints. This 
per-replica data impacts scalability in the number of repli- 
cas. 
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The present invention makes it possible to do without a point. This minimizes the effort needed to maintain the index- 
physical log by constructing a virtual log dynaiiiically. Each 300 on the update references 202 by minimizing changes to 
data item 202 in the database 606 has assodaled with it an the references 204. 

update reference or other tag 204. Each lag 204 may be Another aliemalive is to use tags 204 containing corn- 
implemented as an attribute of the data item 202, or it may S pound values. Such tags 204 may be used during the 

be associated with the data item 202 in another way, such as grouping step 532 to partition the database by using a tuple 

by a pointer or by a fixed relationship between the location value (X, Y) as the update reference 204, where X is a 

of a given data item 202 and the location of that data item's partition identifier and Y is the simple update reference with 

tag 204. The tag is not part of the synchronized data 202; the the properties discussed above. The X values are assumed 
tag 204 is only used locally. lO also to be (possibly art>itrarily) ordered and a comparison of 

Each tag 204 holds a value from some ordered set. In the tuples proceeds by considering the X component as the most 

simplest case, the tag values are simply integer values. sign could be used. For instance, this tag 204 format could 

Whenever a data item 202 is updated, the associated update be used when the database represents a file system and the 

reference or other tag 204 is updated to a value no smaller partition reference identifies a subtree. This tag 204 format 
than the largest value currently existing in the local copy 15 makes it relatively easy to keep together updates to each 

(such as 606, 612, 614) of the database. In the simplest form, subtree during the synchronization process, thtis promoting 

this amounts to incrementing a global count and putting the locality of reference which can help caching performance, 

incremented value in the update reference 204. Each pair of ^ compound tag 204 format can also be used to allow 

copies that are to be synchronized with respect to one separate partitions of the database to be independently 

another maintain between them the last update reference synchronized during the step 528. In this case particular 

value each has seen from the other. attention must be paid to move or rename operations, since 

Each database replica maintains a tag index 300 into the they can move a data item 202 between partitions. Multiple 
update references 204 of its data items 202. This provides an tag indexes 300 may be maiiitaincd in connection with 
efBcient update-time ordering (partial ordering) of all data compound tags 204 to provide further flexibility in synchro- 
items 202 in the database on a particular replica. nization order. For instance, some replicas may favor local- 

In a preferred implementation, the synchronization topol- ity of reference while others favor selective synchronization 

ogy is a star with one master location such as the master based on event type or frequency. 

system 602 acting as a synchronization hub for many replica The virtual dynamic log of the present invention has 
locations such as the locations 608, 610. In this case, several advantages over a continually updated physical log. 
scalability in the number of replicas may be improved by Continual logging is difBcult to extend from central (star) 
having each replica hold the latest update reference 204 synchronization topologies to peer-to-peer or any-to-any 
which it has seen from the master system as well as the latest synchronization because of the log's role in clash handling, 
update reference 204 the master system has seen from that In resolving a clash, the replica's log is modified to generate 
replica. This leaves the master system with no per-replica a new non-clashing sequence (effeaively rewriting history), 
state information to maintain. These remembered update This works because the replica only synchronizes with one 
references effectively do the job of synchronization check- master. To synchronize with several masters the replica 
points in systems that use physical logs. After synchroniza- would need to keep different logs for each master, because 
tion the next value used to replace an update reference 204 it may successfully synchronize an update sequence to one 
at the source (master) location on the next update to a data master but need to modify that update sequence to synchro- 
item 202 at the source should be strictly greater than the last nize with another master. Keeping a different continual log 
value used, in order to avoid changes being synchronized for each master requires multiple physical copies. With a 
multiple times. dynamically constructed log, this simultaneous maintenance 

In order to synchronize, a virtual update log is dynami- of multiple histories is imphcit in die tags 204. 
cally constructed by reading all updated data items 202 in 45 For similar reasons a master continual log must maintain 

update reference order beginning immediately after the last checkpoints across which no compression occurs. The 

one that has already been seen. This update log contains checkpoints mark the last points seen by each repHca the 

complete data items 202 which may be used to replace the master will sync with. The update history must be stable 

corresponding data items 202 (or create new ones) at the across checkpoints; updates moved across checkpoints 
target location during synchronization. 5Q might be missed during a synchronization. The size of the 

The synchronizing step 528 may include a clash detecting continual log thus depends in part on the number of replicas 

step 530 to detect inconsistencies in the replicas that cannot - a dependency that impairs scalability. Such checkpoints 

be resolved simply by updating a data item's value. For are not needed with dynamic logs because each replica gets 

instance, a data item 202 may have been renamed in one its own virtual log, and that virtual log is suitable for the 
replica and deleted from the other replica. Suitable clash 5s P^*^^ ^^^^^ "^^^^^ synchronization is happening, 

handling methods are described in commonly owned In addition, there Ls a significant performance overhead in 

copending application Sen No. 08/700,489 filed Sep. 3, maintaining the dependency information needed for logical 

1996 and incorporated herein by reference. compression in a continual log. This both adds to the log's 

By appropriate choice of the update reference 204 type storage size and reduces the update throughput. The 
used, a number of desirable properties may be achieved. 60 dynamic log turns this complex activity into a relatively 
Using a simple integer makes tags 204 relatively easy to simple index updating activity and trades update -time over- 
understand and provides a measure of update order prcser- head for a relatively small amount of sync- time overhead, 
vation. An alternative embodiment uses an integer tag 204 In practice a virtual log may also require less storage 
that is only incremented when another replica requests space than a continual log. The virtual log is only needed 
synchronization. Thus all data items 202 updated between 65 during synchronization. The space required by tag indexes 
particular (virtual) checkpoints have the same update refer- 300 and other structures may be less than that required for 
ence. In effect, the update reference 204 becomes a check- dependency tracking. Also, the virtual log need not exist in 
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its entirety in one location. The virtual log is generated by may be formed according to methods of the present inven- 
reading the database through a specialized index; the under- tion. Unless otherwise expressly indicated, the description 
lying data is still just the data item values. The virtual log is herein of methods of the present invention therefore extends 
just an ordering of that data and never needs separate form. corresponding apparatus and articles, and the description 
Likewise, reading a specific portion ofthe virtual log is done 5 of apparatus and articles of the present invention extends 
by filtering the database through a particular range key on likewise to corresponding methods, 
the index 300 concerned, so the log doesn't have to be The invention may be embodied in other specific forms 
physically assembled in one contiguous region in memory or without departing from its essential characteristics. The 
on disk. described embodiments are to be considered in all respects 
„ , . , . J , J . r ^t^A in only as illustrative and not restrictive. Any explanations 
llie dynamic log approach adds one update reference 204 lO .^vided herein of the scienUfic principles employed in the 
to each data Item in a replica, as well as addmg to each p^^^^^i invention are illustrative only. The scope of the 
replica a parUally -ordered mdex into the update references, invention is, therefore, indicated by the appended claims 
and a global vanable contammg the value of the largest rather than by the foregoing description. All changes which 
update reference currently in the replica. None of these come within the meaning and range of equivalency of the 
structures are required by the continual log approach, Stor- 15 claims are to be embraced within their scope, 
age requirements for the global variable are negligible. What is claimed and desired to be secured by patent is: 
Storage needed to hold the index depends (like storage for 1, A method for managing a distributed database of data 
a continual log) on how much update activity occurs. items, including the computer-implemented steps of associ- 
However, the size of the tag index 300 will still be less than ating tags with at least some of the data items in the 
the size of a log for a given number of updates. Moreover, database, the tags determining the relative order of at least 
the size of the tag index 300 will only ever be proportional two operations on data items in the database; in a local copy 
to the number of data items 202 in the system and in no way of the database, updating the tag for a given data item; and 
dependent on the number of updates it has experienced. This maintaining an index into the tags for the local copy, 
results in lower space requirements; one effectively gets log 2. The method of claim 1, wherein database operations are 
compression for free. 25 categorized by event type, and the index into the tags can be 
In summary, the present invention provides a novel sys- "^'^ quickly determine what events occurred after the 
tem and method for synchronizing caches, replicas, and eventcoirespondrag to a given tag value 
other copies of portions of a distributed database. Hie . 3- TTie method of claim 2, further uicluding the computer- 
invention uses tag indexes to provide rapid and efiBcient ™Plemented step of usmg the index to create a hst of recent 
access to the data items that are needed for synchronization. l j u i • ^ t • • u 
Methods of the invention are used for synchronizing copies * The method of claim 3, further comprising the 
of the database, such as (a) a master copy and one or more computer-miplemented step of using the hst of recent events 

ru\ ^ i» « to update a cache contaming at least one of the database data 

slave copies, (b) two or more copies m a peer-to-peer . ^ ^ 

configuration, and (c) a master copy and one or more cached ^^^?^* , , ^ , . - . ■ . , . , . , . 
copies. Agiven copy may be a subset of another copy, may 5. The method of claim 4, wherein the cache is updated m 

overlap the other copy, or may have the same extent as the response to a request to access the cache, 
other coDV ' . g. x},e method of claim 4, wherein the cache is updated 

„,. , ' , periodically by a cache manager, regardless of any requests 

With regard to cache synchronization, benefits of the ^^ access the cache 
invention include the fact that a master system doesn't need ^ ^ ^^^^^ „f ^^^-^ ^ ^^^^^j^ ^^^^ 
to know the update policy or current state of a cache. receiving a token ftom a master system, the token 

Mimmal data is stored in the cache and sent to the master representing the most recent event returned from the master 

system to determme >^at to change m the cache Determm- ^ ^ ^^^^ 

mg what to change m the cache uses a quick database lookup g ^^jj^^^ 3 ^^^^^.^ ^^^^^^ 

based on the tag mdex, aUowing much more frequent cache constrained by event type, so an event appears in the list 
synchromzation. Synchronizmg this cache frequently is ^f ^^^^^ ^ „^ ^^^^ ^^j^^,^^ ^^^^^ 

important because the user should not perceive a difference 5, ^^^(^^j „f ^^^^ 3^ ^^^^^^-^ „f ^^^^(^ 

between the cache and the master. All cache synchronization ^ constrained by number, so that not more than a predeter- 

can be mitiated and managed by the cache, and an unlimited ^^^^^ maximum number of events appears in the list, 
number of cache sites can serve an unlmiited number of j„ ^^^^^ ^j^j^ j ^^^^^^^ ^^^^ ^^^^^^ 
master systems. In addition, data items can be added to or ^ ^^^y^^^ performed on the given 

updated m the cache before a user requests it from the cache j^,^ j,^^ ^^^^^ ^^^j, ^^j^^ ^ ^^^1^^ ^j^^^ ^^^^ ^ 

This prevents the first access from incurring the cost of ^ ^^^^^ j„ j^^^j j^j^^^^ 

fetching the data item from the master at the time the user ^^^^^ ^^^^ j ^^^^^^-^ ^ 

is waiung for a response from the request. „ ,^ ^^g^^, ^ ^^^^^^^ „p^^^,i^^ ^^^^ ^^^^ ^^p^ ^j. 

With regard to replica synchronization generally, one the database requests synchronization with the local copy 

aspect of the invention differs from implementations which containing the tags. 

use physical update logs by removing most of the overhead 12. Hie method of claim 1, further including the step of 

and scalability problems associated with the maintenance of using the index to create a dynamic update log. 
such logs. It also allows more flexibility in what is synchro- 13. xhe method of claim 13, further including the step of 

nized and in what order. The invention differs from imple- ^sing the dynamic update log to synchronize two copies of 
mentations that use state-based synchronization by dramati- database with each other. 

cally improving database size scalability through not having 14. jhe method of claim 12, wherein the dynamic update 

to perform remote state comparisons. log is used to handle clashes while synchronizing copies of 

Although particular methods embodying the present 6S the database, 
invention are expressly illustrated and described herein, it 15. The method of claim 12, wherein each lag also 

will be appreciated that apparatus and article embodiments identifies which copy of the database the data item is located 
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in, so that updates to a given copy can be grouped during 
synchronization. 

16. A computer system comprising a collection of data 
items having associated tags for determining the relative 
order of two operations on a database replica, and an index 
into the tags. 

17. The system of claim 16, wherein the data items reside 
in a hierarchical database. 

18. The system of claim 16, wherein the data items 
comprise HTML documents. 

19. The system of claim 16, wherein the tags include 
compound tags. 

20. The system of claim 19, wherein each compound tag 
includes a database partition identifier and an update refer- 
ence. 

21. The system of claim 16, further comprising a syn- 
chronization means for synchronizing local copies of a 
database. 

22. The system of claim 21, wherein the tags include 
compound tags and the synchronization means supports 
independent synchronization of separate partitions of the 
database. 

23. The system of claim 21, wherein the synchronization 
means comprises means for updating a cache. 

24. The system of claim 21, wherein the synchronization 
means comprises means for updating a replica. 

25. The system of claim 21, wherein the synchronization 
means comprises means for creating an event list. 
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26. The system of claim 25, wherein the synchronization 
means comprises means for constraining membership in the 
event list. 

27. The system of claim 25, wherein the synchronization 
means comprises means for creating a dynamic update log. 

28. A computer storage medium having a configuration 
that represents data and instructions which will cause at least 
a portion of a computer system to perform method steps for 
managing a distributed database of data items, the method 
steps comprising associating tags with at least some of the 
data items in the database, the tags determining the relative 
order of operations on data items in the database; in a local 
copy of the database, updating a tag for a given data item; 
and maintaining an index into the tags. 

29. The storage medium of claim 28, wherein the method 
steps further comprise the step of using the index to create 
a list of recent events. 

30. The storage medium of claim 29, wherein the method 
steps further comprise the step of using the list of recent 
events to update a cache containing at least one of the 
database data items. 

31. The storage medium of claim 28, wherein the method 
steps further comprise the step of using the index to create 
a dynamic update log, 

32. The storage medium of claim 31, wherein the method 
steps further comprise the step of using the dynamic update 
log to synchronize two copies of at least part of the database 
with each other. 
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